Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the actual XRay remote sampler which polls rules and orders / app… #3343

Merged
merged 4 commits into from
Jun 27, 2021

Conversation

anuraaga
Copy link
Contributor

…lies them.

This implements the core part of XRay remote sampling, which polls rules and applies a fixed rate based on matching rules. This is the most important part, but in followups I will also be adding support for XRay's "centralized reservoir" (where the server handles the reservoir quotas instead of being fixed per client, this is what the GetSamplingTargets RPC is used for). @jkwatson You aint seen complicated yet :P

@codecov
Copy link

codecov bot commented Jun 24, 2021

Codecov Report

Merging #3343 (256b441) into main (fb7e8d2) will increase coverage by 18.46%.
The diff coverage is n/a.

❗ Current head 256b441 differs from pull request most recent head 2254dd7. Consider uploading reports for the commit 2254dd7 to get more accurate results
Impacted file tree graph

@@             Coverage Diff             @@
##             main    #3343       +/-   ##
===========================================
+ Coverage        0   18.46%   +18.46%     
- Complexity      0      721      +721     
===========================================
  Files           0      351      +351     
  Lines           0     9618     +9618     
  Branches        0      972      +972     
===========================================
+ Hits            0     1776     +1776     
- Misses          0     7657     +7657     
- Partials        0      185      +185     
Impacted Files Coverage Δ
...emetry/exporter/logging/LoggingMetricExporter.java 0.00% <0.00%> (ø)
...er/logging/otlp/OtlpJsonLoggingMetricExporter.java 0.00% <0.00%> (ø)
.../java/io/opentelemetry/sdk/internal/TestClock.java 0.00% <0.00%> (ø)
...io/opentelemetry/context/ImplicitContextKeyed.java 100.00% <0.00%> (ø)
...sion/trace/jaeger/sampler/JaegerRemoteSampler.java 80.55% <0.00%> (ø)
.../metrics/aggregator/DoubleHistogramAggregator.java 0.00% <0.00%> (ø)
...s/AbstractDoubleAsynchronousInstrumentBuilder.java 0.00% <0.00%> (ø)
...pentelemetry/api/metrics/DefaultMeterProvider.java 0.00% <0.00%> (ø)
...er/otlp/trace/InstrumentationLibraryMarshaler.java 0.00% <0.00%> (ø)
...src/main/java/io/opentelemetry/api/trace/Span.java 31.70% <0.00%> (ø)
... and 341 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fb7e8d2...2254dd7. Read the comment docs.

return new AwsXrayRemoteSamplerBuilder(resource);
}

AwsXrayRemoteSampler(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: this is my defensive thinking, but as the constructor is package-protected and thus accessible, I'd validate constructor parameters here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think they're all validated in the builder


pollFuture =
executor.scheduleAtFixedRate(
this::getAndUpdateSampler, 0, pollingIntervalNanos, TimeUnit.NANOSECONDS);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that pooling interval should have a sanity check - that is if the value is outside of typical, expected values, a warning should be logged. For me it's easy to imagine sb setting interval to 1 (assuming it's one second) and degrading the performance with request each nanosecond ;)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like all the validation is down in the builder.


// In practice, X-Ray always returns a Default rule that matches all requests so it is a bug in
// our code or X-Ray to reach here, fallback just in case.
logger.log(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👌

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. In the case this happens, it will totally spam the logs at WARNING for every span. I know it shouldn't ever happen, but should we maybe not be so aggressive with the logging here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True that the fallback loses its meaning if it adds destructive spam on every single span, turned it to FINE (thought about throttling logger but will lose access to it in contrib repo 😿 )

@jkwatson
Copy link
Contributor

You aint seen complicated yet :P

Hm. If that's the case, I'm definitely starting to question if this repository is the right place to host the code. Can AWS publish a library with all this code in it that we can just depend on, rather than having it all live here (as well in wherever it lives for the old AWS java instrumentation)?

@anuraaga
Copy link
Contributor Author

I think code for otel components that interface with vendors generally go in an otel contrib repo. We've been treating the sdk-extensions as a sort of contrib but maybe we should move to java-contrib? Not sure if it makes a difference in the actual scheme of things though - if the issue is review load a better option (though not mutually exclusive) is probably to add @kubawach to approvers finally :)

@jkwatson
Copy link
Contributor

jkwatson commented Jun 24, 2021

I think code for otel components that interface with vendors generally go in an otel contrib repo. We've been treating the sdk-extensions as a sort of contrib but maybe we should move to java-contrib? Not sure if it makes a difference in the actual scheme of things though - if the issue is review load a better option (though not mutually exclusive) is probably to add @kubawach to approvers finally :)

The issue isn't review load (but I'm also in favor of getting @kubawach added to approvers), but potential maintenance & support load over the long term. If you are the only person who is really able to maintain the code, and debug issues with its interaction with AWS services, then that feels like the core implementation details probably don't belong here. A simple AWS propagator is one thing, but when we're talking about maintaining several thousand lines of non-trivial code for an AWS-specific sampler implementation, I'm probably against it.

this.resource = resource;
this.initialSampler = initialSampler;
client = new XraySamplerClient(endpoint);
executor =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

personally, I'd just assign this in the declaration above, since it's final and doesn't depend on params.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to like either all assigned as fields or all in constructor, otherwise it seems like an ugly duck field

Copy link
Contributor

@jkwatson jkwatson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Looks good enough to merge, then move over to the contrib repo.

Copy link

@jmacd jmacd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👀

Speaking for myself (and my employer), we have a strong interest in being able to approximately count sampled spans. I am also confident we can do this counting correctly and with a specification, as discussed in open-telemetry/oteps#148. From this perspective, it's the "Priority", "FixedRate", and "ReservoirSize" that are involved in sampling probability. As discussed in OTEP 148 Samplers are capable of producing probabilities for fixed rate sampling, we'll just have to replace Leaky-bucket rate limiting with Varopt reservoir sampling.

I'd like to draw a non-blocking connection between the XRay sampler and the discussion about Views taking place in the Metrics SIG. Sampler configurations are like views on spans, and as long as we're able to count them, they are also views that can produce metrics. The big question ia whether this configuration syntax can become a standard for sampling in OTel's default SDKs, and if so--how is configuration is supplied?

@anuraaga
Copy link
Contributor Author

@jmacd Thanks - I agree that any rate limiting can be made compatible with the need for counting spans by tweaking the algorithm, perhaps to Varopt. In the meantime I'll be starting with the same as Jaeger but will be happy to change in the future.

The big question ia whether this configuration syntax can become a standard for sampling in OTel's default SDKs, and if so--how is configuration is supplied?

Configuration is basically just data, not that much unlike even spans - just the data that's sent is about configuring an app, not an event. I'd expect we define a proto and transmit it over gRPC just like OTLP, there doesn't seem to be a reason to break from convention. What's in the proto will be a finer nuance - I wouldn't expect to crib much from the X-Ray data model for sampling since it's not very intuitive / well designed I think. In particular, reasoning out the result of having both rate limiting and fixed rate possible at the same time is mind blowing. Adding attribute matching to the jaeger data model is probably a better start.

@anuraaga anuraaga merged commit f83a953 into open-telemetry:main Jun 27, 2021
This was referenced Dec 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants